fix: handle compaction truncation and output budgets by kermanx · Pull Request #267 · MoonshotAI/kimi-code

kermanx · 2026-06-01T06:48:49Z

Related Issue

No linked issue. This fixes compaction diagnostics and completion budget propagation issues found while handling long-context model requests.

Problem

Full compaction used a generic local error when the model response was truncated, so exhausted retries surfaced as a plain Error instead of a useful compaction-specific failure. The compaction path could also try to apply a zero completion budget when the selected model's context window was unknown. Separately, only some providers implemented the shared completion-budget hook, so ordinary turns and compaction did not consistently send an explicit output-token budget across supported provider backends.

What changed

Added a named compaction truncation error and kept truncated summaries on the same reduction-and-retry path as context overflow.
Avoid applying a compaction completion budget when the model context window is unknown.
Added provider-level completion-budget support for Anthropic, OpenAI Chat Completions, OpenAI Responses, Google GenAI, and Vertex AI.
Added regression coverage for compaction unknown-context behavior and provider request bodies.
Merged the changeset into a single entry covering agent-core, kosong, and the CLI bundle.

Checklist

I have read the CONTRIBUTING document.
I have linked a related issue, or explained the problem above.
I have added tests that prove my feature works.
Ran gen-changesets skill, or this PR needs no changeset.
Ran gen-docs skill, or this PR needs no doc update.

Validation:

pnpm exec vitest run packages/agent-core/test/agent/compaction/full.test.ts packages/kosong/test/openai-legacy.test.ts packages/kosong/test/openai-responses.test.ts packages/kosong/test/google-genai.test.ts packages/kosong/test/anthropic.test.ts
pnpm --filter @moonshot-ai/agent-core run typecheck
pnpm --filter @moonshot-ai/kosong run typecheck

changeset-bot · 2026-06-01T06:48:53Z

🦋 Changeset detected

Latest commit: 534844a

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages

Name	Type
@moonshot-ai/agent-core	Patch
@moonshot-ai/kosong	Patch
@moonshot-ai/kimi-code	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

pkg-pr-new · 2026-06-01T06:49:36Z

pnpm dlx https://pkg.pr.new/@moonshot-ai/kimi-code@534844a

npx https://pkg.pr.new/@moonshot-ai/kimi-code@534844a

commit: 534844a

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1fe86cf7fb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6a15625d40

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T07:17:19Z

+  withMaxCompletionTokens(maxCompletionTokens: number): AnthropicChatProvider {
+    return this._withGenerationKwargs({
+      max_tokens: resolveDefaultMaxTokens(this._model, maxCompletionTokens),
+    });


Preserve lower Anthropic max token caps

When an Anthropic model alias sets maxOutputSize, provider construction passes it as defaultMaxTokens, but after this method exists the agent's per-request completion-budget path calls withMaxCompletionTokens(maxContextSize) for normal and compaction requests. Because this assignment recomputes max_tokens from the larger context-window cap, a user-configured lower output limit is raised back to the model ceiling, so aliases intended to limit cost or force shorter responses no longer do so. Please keep the existing lower max_tokens when it is below maxCompletionTokens.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T07:17:19Z

+      const maxContextTokens = this.agent.config.modelCapabilities.max_context_tokens;
+      const provider =
+        maxContextTokens > 0
+          ? baseProvider.withMaxCompletionTokens?.(maxContextTokens) ?? baseProvider
+          : baseProvider;


Honor completion-budget env caps during compaction

When an operator sets KIMI_MODEL_MAX_COMPLETION_TOKENS (or the legacy env var) to a lower hard cap, or sets it non-positive to opt out, normal turns go through resolveCompletionBudget/applyCompletionBudget, but this compaction path always clones the provider with max_context_tokens. On large-window models this means compaction still serializes a much larger completion budget than the configured cap (or sends one despite opt-out), so the requests operators explicitly capped can fail or become unexpectedly expensive during compaction. Please route compaction through the same budget resolution or skip this clone when the budget is disabled.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T07:17:19Z

+  withMaxCompletionTokens(maxCompletionTokens: number): OpenAIResponsesChatProvider {
+    return this.withGenerationKwargs({ max_output_tokens: maxCompletionTokens });


Clamp OpenAI output caps below context size

For OpenAI Responses aliases whose max_context_size is larger than the model's maximum output size, adding this method activates the generic completion-budget path and sends that context window as max_output_tokens on every turn. The repo already treats upstream messages such as max_output_tokens must not exceed 8192 as plain APIStatusErrors rather than context-overflow retries, so a correctly configured large-context OpenAI model can start failing before generation instead of just allowing a large completion. Please clamp to the provider/model output ceiling or preserve an existing lower cap before serializing this value.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5c160ae550

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T08:21:58Z

+  withMaxCompletionTokens(maxCompletionTokens: number): OpenAIResponsesChatProvider {
+    return this.withGenerationKwargs({ max_output_tokens: maxCompletionTokens });
+  }


Preserve explicit Responses output caps

When an OpenAI Responses provider is constructed with maxOutputTokens (or already has max_output_tokens via withGenerationKwargs), normal agent turns now always pass through applyCompletionBudget, so this method overwrites that explicit cap with the generic context-sized budget (often 32000 or the model context window). In those configurations a user/operator limit meant to bound cost or avoid provider max-output errors is silently raised; this hook should keep the existing lower max_output_tokens instead of replacing it unconditionally.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T08:21:58Z

+      max_tokens:
+        this._generationKwargs.max_tokens ??
+        resolveDefaultMaxTokens(this._model, maxCompletionTokens),


Apply Anthropic hard caps below the default max_tokens

When KIMI_MODEL_MAX_COMPLETION_TOKENS is set below Anthropic's constructor default (for example 8192 on a Claude 4 alias whose default max_tokens is 64000/128000), normal turns and compaction now call withMaxCompletionTokens(8192), but this implementation keeps the existing higher max_tokens and ignores the hard cap. That means the env limit documented as an explicit cap is not enforced for Anthropic requests unless max_tokens was first cleared; clamp the existing value down when it exceeds maxCompletionTokens while still preserving lower user caps.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T08:21:58Z

+  withMaxCompletionTokens(maxCompletionTokens: number): OpenAILegacyChatProvider {
+    return this.withGenerationKwargs({ max_tokens: maxCompletionTokens });


Preserve explicit Chat Completions token caps

When an OpenAI Chat Completions provider is constructed with maxTokens (or has max_tokens set via withGenerationKwargs), ordinary agent turns now route through applyCompletionBudget, so this method replaces that explicit limit with the generic completion budget derived from the context window or fallback. In direct SDK/provider usage that turns a deliberately low cap into a much larger request budget, which can increase cost or defeat tests/configurations that rely on truncation; keep an existing lower max_tokens instead of overwriting it unconditionally.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 534844a62e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T08:54:21Z

  }

+  withMaxCompletionTokens(maxCompletionTokens: number): GoogleGenAIChatProvider {
+    return this.withGenerationKwargs({ max_output_tokens: maxCompletionTokens });


Clamp Gemini budgets to output limits

When a google-genai/vertexai alias has maxContextSize above Gemini's output limit (the env-model default context is 262144), applyCompletionBudget passes that context-sized cap into this hook, and this line serializes the same value as max_output_tokens. Google documents Gemini 2.5 Pro's output limit as 65,536 tokens, so the default 256K-context configuration can make every normal turn or compaction request invalid instead of merely allowing a large completion; please clamp this to the provider/model output ceiling or preserve a lower existing cap.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T08:54:21Z

  }

+  withMaxCompletionTokens(maxCompletionTokens: number): OpenAILegacyChatProvider {
+    return this.withGenerationKwargs({ max_tokens: maxCompletionTokens });


Use max_completion_tokens for o-series chat

When this provider is used with an o-series Chat Completions model such as o1/o3 (the local capability catalog explicitly matches those models), ordinary agent turns now always call this hook through applyCompletionBudget, so every request includes max_tokens. OpenAI's Chat Completions docs state that max_tokens is not compatible with o-series models and is superseded by max_completion_tokens (https://platform.openai.com/docs/api-reference/chat/create), so these configurations will start failing before generation; please use the compatible field for o-series models or skip the hook there.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T08:54:21Z

+      max_tokens:
+        existingCap === undefined || this._explicitMaxTokens
+          ? existingCap ?? requestedCap
+          : Math.min(existingCap, requestedCap),


Keep Anthropic thinking budgets below max_tokens

For fixed-budget Anthropic thinking (for example withThinking('high') on pre-adaptive Claude models) plus a lower completion cap such as KIMI_MODEL_MAX_COMPLETION_TOKENS=8192, this branch lowers max_tokens to the cap but leaves the existing thinking.budget_tokens at 32000. Anthropic's extended-thinking docs require budget_tokens to be less than max_tokens outside the interleaved-tools exception (https://platform.claude.com/docs/en/build-with-claude/extended-thinking), so simple Anthropic requests in that configuration become invalid; please lower/disable the thinking budget or avoid applying an incompatible cap.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T08:54:21Z

  }

+  withMaxCompletionTokens(maxCompletionTokens: number): OpenAILegacyChatProvider {
+    return this.withGenerationKwargs({ max_tokens: maxCompletionTokens });


Leave room for Chat Completions prompts

For non-o-series Chat Completions aliases, the default budget path passes the full configured context window into this hook (for env models that is 262144 unless overridden), so even a tiny prompt is sent with max_tokens equal to the whole context. OpenAI-compatible chat endpoints count prompt plus max_tokens against the model context, so these requests can be rejected immediately instead of letting the model use the remaining space; please subtract the serialized prompt size or clamp to a safe output ceiling before setting max_tokens.

Useful? React with 👍 / 👎.

fix(agent-core): name truncated compaction errors

1fe86cf

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread packages/agent-core/src/agent/compaction/full.ts Outdated

fix: handle compaction truncation and output budgets

6a15625

RealKai42 changed the title ~~fix(agent-core): name truncated compaction errors~~ fix: handle compaction truncation and output budgets Jun 1, 2026

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

fix: honor compaction output budgets

5c160ae

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

fix: honor anthropic completion hard caps

534844a

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

kermanx merged commit e2e1728 into main Jun 1, 2026
8 checks passed

kermanx deleted the xtr/named-compaction-truncation branch June 1, 2026 09:25

github-actions Bot mentioned this pull request Jun 1, 2026

ci: release packages #237

Open

		withMaxCompletionTokens(maxCompletionTokens: number): OpenAIResponsesChatProvider {
		return this.withGenerationKwargs({ max_output_tokens: maxCompletionTokens });

		withMaxCompletionTokens(maxCompletionTokens: number): OpenAILegacyChatProvider {
		return this.withGenerationKwargs({ max_tokens: maxCompletionTokens });

Conversation

kermanx commented Jun 1, 2026 • edited by RealKai42 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issue

Problem

What changed

Checklist

Uh oh!

changeset-bot Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

pkg-pr-new Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kermanx commented Jun 1, 2026 •

edited by RealKai42

Loading

changeset-bot Bot commented Jun 1, 2026 •

edited

Loading

pkg-pr-new Bot commented Jun 1, 2026 •

edited

Loading